Large Scale Empirical Risk Minimization via Truncated Adaptive Newton Method
نویسندگان
چکیده
We consider large scale empirical risk minimization (ERM) problems, where both the problem dimension and variable size is large. In these cases, most second order methods are infeasible due to the high cost in both computing the Hessian over all samples and computing its inverse in high dimensions. In this paper, we propose a novel adaptive sample size second-order method, which reduces the cost of computing the Hessian by solving a sequence of ERM problems corresponding to a subset of samples and lowers the cost of computing the Hessian inverse using a truncated eigenvalue decomposition. We show that while we geometrically increase the size of the training set at each stage, a single iteration of the truncated Newton method is sufficient to solve the new ERM within its statistical accuracy. Moreover, for a large number of samples we are allowed to double the size of the training set at each stage, and the proposed method subsequently reaches the statistical accuracy of the full training set approximately after two effective passes. In addition to this theoretical result, we show empirically on a number of well known data sets that the proposed truncated adaptive sample size algorithm outperforms stochastic alternatives for solving ERM problems.
منابع مشابه
Adaptive Newton Method for Empirical Risk Minimization to Statistical Accuracy
We consider empirical risk minimization for large-scale datasets. We introduce Ada Newton as an adaptive algorithm that uses Newton’s method with adaptive sample sizes. The main idea of Ada Newton is to increase the size of the training set by a factor larger than one in a way that the minimization variable for the current training set is in the local neighborhood of the optimal argument of the...
متن کاملComparison of advanced large-scale minimization algorithms for the solution of inverse ill-posed problems
We compare the performance of several robust large-scale minimization algorithms for the unconstrained minimization of an ill-posed inverse problem. The parabolized Navier-Stokes equations model was used for adjoint parameter estimation. The methods compared consist of two versions of the nonlinear conjugate gradient method (CG), Quasi-Newton (BFGS), the limited memory Quasi-Newton (L-BFGS) [15...
متن کاملComparison of advanced large-scale minimization algorithms for the solution of inverse ill-posed problems
We compare the performance of several robust large-scale minimization algorithms for the unconstrained minimization of an ill-posed inverse problem. The parabolized Navier–Stokes equation model was used for adjoint parameter estimation. The methods compared consist of three versions of nonlinear conjugate-gradient (CG) method, quasiNewton Broyden–Fletcher–Goldfarb–Shanno (BFGS), the limited-mem...
متن کاملThe truncated Newton method for Full Waveform Inversion
Full Waveform Inversion (FWI) is a promising seismic imaging method. It aims at computing quantitative estimates of the subsurface parameters (bulk wave velocity, shear wave velocity, rock density) from local measurements of the seismic wavefield. Based on a particular wave propagation engine for wavefield estimation, it consists in minimizing iteratively the distance between the predicted wave...
متن کاملTruncated-Newton Training Algorithm for
We present an estimate approach to compute the viscoplastic behavior of a polymer matrix composite (PMC) under different thermomechanical environments. This investigation incorporates computational neural network as the tool for deter-mining the creep behavior of the composite. We propose a new second-order learning algorithm for training the multilayer networks. Training in the neural network ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- CoRR
دوره abs/1705.07957 شماره
صفحات -
تاریخ انتشار 2017